class: center, middle, inverse, title-slide .title[ # Week 1: Forking paths and sampling ] --- # Counting possibilities Suppose we have a bag that contains 4 marbles, and these marbles come in two colors: blue and white. We don't know how many of the marbles are blue and how many are white. There are therefore five possibilities: <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-2-1.png" width="504" /> --- Suppose we draw three marbles from the bag. What is every possible outcome? <!-- --> --- We draw three marbles from the bag: <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-4-1.png" width="504" /> --- Now we count how frequently this exact sequence arises: <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-5-1.png" width="504" /> --- This is only one of the possible combinations of white and blue marbles in the bag, so we need to recreate this for all possibilities. Because there's at least one blue and at least one white marble, we can rule out the possibilities that are all white and all blue, leaving us with three possible combinations of white and blue marbles. .pull-left[ <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-6-1.png" width="504" /> ] .pull-right[ <div class="tabwid"><style>.cl-3abd51e0{}.cl-3aba506c{font-family:'Helvetica';font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-3abb3bd0{margin:0;text-align:center;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-3abb45d0{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 1.5pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3abb45da{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 1.5pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3abb45db{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3abb45e4{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3abb45e5{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3abb45ee{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}</style><table data-quarto-disable-processing='true' class='cl-3abd51e0'><thead><tr style="overflow-wrap:break-word;"><th class="cl-3abb45d0"><p class="cl-3abb3bd0"><span class="cl-3aba506c">conjecture</span></p></th><th class="cl-3abb45da"><p class="cl-3abb3bd0"><span class="cl-3aba506c">Ways to produce [w b w]</span></p></th></tr></thead><tbody><tr style="overflow-wrap:break-word;"><td class="cl-3abb45db"><p class="cl-3abb3bd0"><span class="cl-3aba506c">[w w w w]</span></p></td><td class="cl-3abb45e4"><p class="cl-3abb3bd0"><span class="cl-3aba506c">0 * 4 * 0 = 0</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3abb45db"><p class="cl-3abb3bd0"><span class="cl-3aba506c">[b w w w]</span></p></td><td class="cl-3abb45e4"><p class="cl-3abb3bd0"><span class="cl-3aba506c">1 * 3 * 1 = 3</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3abb45db"><p class="cl-3abb3bd0"><span class="cl-3aba506c">[b b w w]</span></p></td><td class="cl-3abb45e4"><p class="cl-3abb3bd0"><span class="cl-3aba506c">2 * 2 * 2 = 8</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3abb45db"><p class="cl-3abb3bd0"><span class="cl-3aba506c">[b b b w]</span></p></td><td class="cl-3abb45e4"><p class="cl-3abb3bd0"><span class="cl-3aba506c">3 * 1 * 3 = 9</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3abb45e5"><p class="cl-3abb3bd0"><span class="cl-3aba506c">[b b b b]</span></p></td><td class="cl-3abb45ee"><p class="cl-3abb3bd0"><span class="cl-3aba506c">4 * 0 * 4 = 0</span></p></td></tr></tbody></table></div> ] --- <div class="tabwid"><style>.cl-3b0dbdd8{}.cl-3b0af242{font-family:'Helvetica';font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-3b0c2d56{margin:0;text-align:center;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-3b0c3aa8{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 1.5pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b0c3ada{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 1.5pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b0c3ae4{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b0c3af8{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b0c3b0c{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b0c3b20{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}</style><table data-quarto-disable-processing='true' class='cl-3b0dbdd8'><thead><tr style="overflow-wrap:break-word;"><th class="cl-3b0c3aa8"><p class="cl-3b0c2d56"><span class="cl-3b0af242">conjecture</span></p></th><th class="cl-3b0c3ada"><p class="cl-3b0c2d56"><span class="cl-3b0af242">Ways to produce [w b w]</span></p></th></tr></thead><tbody><tr style="overflow-wrap:break-word;"><td class="cl-3b0c3ae4"><p class="cl-3b0c2d56"><span class="cl-3b0af242">[w w w w]</span></p></td><td class="cl-3b0c3af8"><p class="cl-3b0c2d56"><span class="cl-3b0af242">0 * 4 * 0 = 0</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b0c3ae4"><p class="cl-3b0c2d56"><span class="cl-3b0af242">[b w w w]</span></p></td><td class="cl-3b0c3af8"><p class="cl-3b0c2d56"><span class="cl-3b0af242">1 * 3 * 1 = 3</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b0c3ae4"><p class="cl-3b0c2d56"><span class="cl-3b0af242">[b b w w]</span></p></td><td class="cl-3b0c3af8"><p class="cl-3b0c2d56"><span class="cl-3b0af242">2 * 2 * 2 = 8</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b0c3ae4"><p class="cl-3b0c2d56"><span class="cl-3b0af242">[b b b w]</span></p></td><td class="cl-3b0c3af8"><p class="cl-3b0c2d56"><span class="cl-3b0af242">3 * 1 * 3 = 9</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b0c3b0c"><p class="cl-3b0c2d56"><span class="cl-3b0af242">[b b b b]</span></p></td><td class="cl-3b0c3b20"><p class="cl-3b0c2d56"><span class="cl-3b0af242">4 * 0 * 4 = 0</span></p></td></tr></tbody></table></div> From these counts, we can see intuitively that the sequence is more likely to occur when there are 3 blue and 1 white marble, although also likely to occur when there are 2 of each. Of course, this assumes what? Counts are nice, but they don't translate well across studies/data. - Relative size matters (3, 8, and 9 are no different from 30, 80, and 90). - As the amount of data grows, counts will quickly become very very very large. ??? assumes that all the options are equally likely to begin with. --- ## From counts to probabilities. So let's transform these counts into **probabilities**: $$ \text{probability of }p = \frac{\text{ways } p \text{ can produce D}}{\text{sum of all ways} } $$ <div class="tabwid"><style>.cl-3b170dd4{}.cl-3b13f2fc{font-family:'Helvetica';font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-3b150570{margin:0;text-align:center;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-3b1510a6{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 1.5pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b1510ba{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 1.5pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b1510c4{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b1510e2{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b1510e3{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b151100{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}</style><table data-quarto-disable-processing='true' class='cl-3b170dd4'><thead><tr style="overflow-wrap:break-word;"><th class="cl-3b1510a6"><p class="cl-3b150570"><span class="cl-3b13f2fc">conjecture</span></p></th><th class="cl-3b1510a6"><p class="cl-3b150570"><span class="cl-3b13f2fc">proportion W</span></p></th><th class="cl-3b1510ba"><p class="cl-3b150570"><span class="cl-3b13f2fc">Ways to produce [w b w]</span></p></th><th class="cl-3b1510a6"><p class="cl-3b150570"><span class="cl-3b13f2fc">probability</span></p></th></tr></thead><tbody><tr style="overflow-wrap:break-word;"><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">[w w w w]</span></p></td><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">0.00</span></p></td><td class="cl-3b1510e2"><p class="cl-3b150570"><span class="cl-3b13f2fc">0 * 4 * 0 = 0</span></p></td><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">0.00</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">[b w w w]</span></p></td><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">0.25</span></p></td><td class="cl-3b1510e2"><p class="cl-3b150570"><span class="cl-3b13f2fc">1 * 3 * 1 = 3</span></p></td><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">0.15</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">[b b w w]</span></p></td><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">0.50</span></p></td><td class="cl-3b1510e2"><p class="cl-3b150570"><span class="cl-3b13f2fc">2 * 2 * 2 = 8</span></p></td><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">0.40</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">[b b b w]</span></p></td><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">0.75</span></p></td><td class="cl-3b1510e2"><p class="cl-3b150570"><span class="cl-3b13f2fc">3 * 1 * 3 = 9</span></p></td><td class="cl-3b1510c4"><p class="cl-3b150570"><span class="cl-3b13f2fc">0.45</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b1510e3"><p class="cl-3b150570"><span class="cl-3b13f2fc">[b b b b]</span></p></td><td class="cl-3b1510e3"><p class="cl-3b150570"><span class="cl-3b13f2fc">1.00</span></p></td><td class="cl-3b151100"><p class="cl-3b150570"><span class="cl-3b13f2fc">4 * 0 * 4 = 0</span></p></td><td class="cl-3b1510e3"><p class="cl-3b150570"><span class="cl-3b13f2fc">0.00</span></p></td></tr></tbody></table></div> --- # Globe Let's recreate the 4-sided globe example from McElreath's lecture: W L W W W L W L W ``` r sample <- c("W", "L", "W", "W", "W", "L", "W", "L", "W") W <- sum(sample == "W") #number of W observed L <- sum(sample == "L") #number of L observed ``` We also specify some (small, discrete) number of possible hypotheses regarding the true proportion of water on the globe. ``` r p <- c(0, .25, .5, .75, 1) #hypothetical proportions of W ``` --- Finally, we count the number ways to produce our sample within each of these proportions: $$ \text{Ways} = 4p^W \times4(1-p)^L $$ ``` r ways <- sapply(p, function(q) (q*4)^W * ((1-q)*4)^L ) probability = ways/sum(ways) cbind( p, ways, probability ) ``` ``` ## p ways probability ## [1,] 0.00 0 0.00000000 ## [2,] 0.25 27 0.02129338 ## [3,] 0.50 512 0.40378549 ## [4,] 0.75 729 0.57492114 ## [5,] 1.00 0 0.00000000 ``` --- We can simulate many tosses of the globe with this function: ``` r # function to toss a globe covered p by water N times sim_globe = function( p=0.7 , N=9 ){ sample( x = c("W", "L"), # possible values size = N, # how many draws prob = c(p, 1-p), # probability of each possibility replace = TRUE # the same value can be drawn multiple times ) } sim_globe() ``` ``` ## [1] "L" "L" "W" "L" "W" "W" "W" "W" "W" ``` --- ## Exercise: Write function Write this function in R. (Don't copy and paste.) ``` r sim_globe = function( p=0.7 , N=9 ){ sample( x = c("W", "L"), size = N, prob = c(p, 1-p), replace = TRUE ) } ``` Test it out. - Show that it produces all `W` if you set `p` to 0. - Show that when `N` is very large, the proportion of water is extremely close to `p`. --- Now here is the function to compute the posterior: ``` r # function to compute posterior distribution compute_posterior = function( the_sample, poss = c(0, .25, .50, .75, 1) ){ W = sum(the_sample == "W") # number of W observed L = sum(the_sample == "L") # number of L observed num_sides = length(poss) - 1 ways = sapply( poss, function(q) (q*num_sides)^W * ((1-q)*num_sides)^L ) post = ways/sum(ways) bars = sapply(post, function(q) make_bar(q)) # this is from the rethinking package data.frame(poss, ways, post = round(post, 3), bars) } ``` --- In the lecture, RM uses this function to determine that, of the possibilities for the 4-sided globe, it makes the most sense for 3 of them to be water. ``` r compute_posterior(sample) ``` ``` ## poss ways post bars ## 1 0.00 0 0.000 ## 2 0.25 27 0.021 ## 3 0.50 512 0.404 ######## ## 4 0.75 729 0.575 ########### ## 5 1.00 0 0.000 ``` ## Exercise: Test compute posterior Test this sample on a dice with 10 sides. Test this sample on a dice with 20 sides. Test this sample on a dice with 100 sides. --- Test this sample on a dice with 10 sides. ``` r compute_posterior(sample, poss = seq(0, 1, length.out = 11)) ``` ``` ## poss ways post bars ## 1 0.0 0 0.000 ## 2 0.1 729 0.000 ## 3 0.2 32768 0.003 ## 4 0.3 250047 0.021 ## 5 0.4 884736 0.074 # ## 6 0.5 1953125 0.164 ### ## 7 0.6 2985984 0.251 ##### ## 8 0.7 3176523 0.267 ##### ## 9 0.8 2097152 0.176 #### ## 10 0.9 531441 0.045 # ## 11 1.0 0 0.000 ``` --- Test this sample on a dice with 20 sides. ``` r compute_posterior(sample, poss = seq(0, 1, length.out = 21)) ``` ``` ## poss ways post bars ## 1 0.00 0 0.000 ## 2 0.05 6859 0.000 ## 3 0.10 373248 0.000 ## 4 0.15 3581577 0.000 ## 5 0.20 16777216 0.001 ## 6 0.25 52734375 0.004 ## 7 0.30 128024064 0.011 ## 8 0.35 258474853 0.021 ## 9 0.40 452984832 0.037 # ## 10 0.45 707347971 0.058 # ## 11 0.50 1000000000 0.082 ## ## 12 0.55 1291467969 0.106 ## ## 13 0.60 1528823808 0.125 ### ## 14 0.65 1655595487 0.136 ### ## 15 0.70 1626379776 0.133 ### ## 16 0.75 1423828125 0.117 ## ## 17 0.80 1073741824 0.088 ## ## 18 0.85 651714363 0.053 # ## 19 0.90 272097792 0.022 ## 20 0.95 47045881 0.004 ## 21 1.00 0 0.000 ``` --- Test this sample on a dice with 100 sides. ``` r compute_posterior(sample, poss = seq(0, 1, length.out = 101)) ``` ``` ## poss ways post bars ## 1 0.00 0.000000e+00 0.000 ## 2 0.01 9.702990e+05 0.000 ## 3 0.02 6.023629e+07 0.000 ## 4 0.03 6.653386e+08 0.000 ## 5 0.04 3.623879e+09 0.000 ## 6 0.05 1.339648e+10 0.000 ## 7 0.06 3.875173e+10 0.000 ## 8 0.07 9.463180e+10 0.000 ## 9 0.08 2.041284e+11 0.000 ## 10 0.09 4.004785e+11 0.000 ## 11 0.10 7.290000e+11 0.000 ## 12 0.11 1.248896e+12 0.000 ## 13 0.12 2.034864e+12 0.000 ## 14 0.13 3.178468e+12 0.000 ## 15 0.14 4.789207e+12 0.000 ## 16 0.15 6.995268e+12 0.000 ## 17 0.16 9.943923e+12 0.000 ## 18 0.17 1.380155e+13 0.000 ## 19 0.18 1.875325e+13 0.000 ## 20 0.19 2.500211e+13 0.000 ## 21 0.20 3.276800e+13 0.000 ## 22 0.21 4.228604e+13 0.000 ## 23 0.22 5.380466e+13 0.000 ## 24 0.23 6.758327e+13 0.001 ## 25 0.24 8.388962e+13 0.001 ## 26 0.25 1.029968e+14 0.001 ## 27 0.26 1.251801e+14 0.001 ## 28 0.27 1.507132e+14 0.001 ## 29 0.28 1.798646e+14 0.002 ## 30 0.29 2.128938e+14 0.002 ## 31 0.30 2.500470e+14 0.002 ## 32 0.31 2.915529e+14 0.002 ## 33 0.32 3.376188e+14 0.003 ## 34 0.33 3.884258e+14 0.003 ## 35 0.34 4.441251e+14 0.004 ## 36 0.35 5.048337e+14 0.004 ## 37 0.36 5.706304e+14 0.005 ## 38 0.37 6.415522e+14 0.005 ## 39 0.38 7.175904e+14 0.006 ## 40 0.39 7.986880e+14 0.007 ## 41 0.40 8.847360e+14 0.007 ## 42 0.41 9.755717e+14 0.008 ## 43 0.42 1.070976e+15 0.009 ## 44 0.43 1.170672e+15 0.010 ## 45 0.44 1.274325e+15 0.011 ## 46 0.45 1.381539e+15 0.012 ## 47 0.46 1.491861e+15 0.013 ## 48 0.47 1.604777e+15 0.013 ## 49 0.48 1.719719e+15 0.014 ## 50 0.49 1.836061e+15 0.015 ## 51 0.50 1.953125e+15 0.016 ## 52 0.51 2.070186e+15 0.017 ## 53 0.52 2.186471e+15 0.018 ## 54 0.53 2.301170e+15 0.019 ## 55 0.54 2.413437e+15 0.020 ## 56 0.55 2.522398e+15 0.021 ## 57 0.56 2.627158e+15 0.022 ## 58 0.57 2.726808e+15 0.023 ## 59 0.58 2.820433e+15 0.024 ## 60 0.59 2.907125e+15 0.024 ## 61 0.60 2.985984e+15 0.025 # ## 62 0.61 3.056137e+15 0.026 # ## 63 0.62 3.116743e+15 0.026 # ## 64 0.63 3.167003e+15 0.027 # ## 65 0.64 3.206176e+15 0.027 # ## 66 0.65 3.233585e+15 0.027 # ## 67 0.66 3.248631e+15 0.027 # ## 68 0.67 3.250803e+15 0.027 # ## 69 0.68 3.239690e+15 0.027 # ## 70 0.69 3.214990e+15 0.027 # ## 71 0.70 3.176523e+15 0.027 # ## 72 0.71 3.124238e+15 0.026 # ## 73 0.72 3.058222e+15 0.026 # ## 74 0.73 2.978712e+15 0.025 # ## 75 0.74 2.886093e+15 0.024 ## 76 0.75 2.780914e+15 0.023 ## 77 0.76 2.663884e+15 0.022 ## 78 0.77 2.535875e+15 0.021 ## 79 0.78 2.397925e+15 0.020 ## 80 0.79 2.251233e+15 0.019 ## 81 0.80 2.097152e+15 0.018 ## 82 0.81 1.937184e+15 0.016 ## 83 0.82 1.772967e+15 0.015 ## 84 0.83 1.606258e+15 0.013 ## 85 0.84 1.438917e+15 0.012 ## 86 0.85 1.272880e+15 0.011 ## 87 0.86 1.110132e+15 0.009 ## 88 0.87 9.526768e+14 0.008 ## 89 0.88 8.024903e+14 0.007 ## 90 0.89 6.614821e+14 0.006 ## 91 0.90 5.314410e+14 0.004 ## 92 0.91 4.139767e+14 0.003 ## 93 0.92 3.104538e+14 0.003 ## 94 0.93 2.219176e+14 0.002 ## 95 0.94 1.490119e+14 0.001 ## 96 0.95 9.188649e+13 0.001 ## 97 0.96 5.009650e+13 0.000 ## 98 0.97 2.249024e+13 0.000 ## 99 0.98 7.086739e+12 0.000 ## 100 0.99 9.414801e+11 0.000 ## 101 1.00 0.000000e+00 0.000 ``` --- .pull-left[ This method is referred to as **grid approximation** -- approximating the continuous posterior distribution using a finite grid of parameter values. ``` r compute_posterior(sample, poss = seq(0, 1, length.out = 101)) %>% ggplot(aes(x = poss, y = ways)) + geom_line() + theme_minimal() ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-20-1.png" width="504" /> ] .pull-right[ This is the Beta distribution, which is the actual form of the posterior, if you were to perform the calculus. ``` r curve( dbeta(x, 6+1, 3+1), lty = 2, lwd = 3) ``` <!-- --> ] --- ## Exercise Compute and plot the distribution for each of the following sets of observations: - W W W - W W W L - L W W L W W W --- W W W ``` r c("W", "W", "W") %>% compute_posterior(poss = seq(0, 1, length.out = 101)) %>% ggplot(aes(x = poss, y = ways)) + geom_line() + theme_minimal() ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-22-1.png" width="504" /> --- W W W L ``` r c("W", "W", "W", "L") %>% compute_posterior(poss = seq(0, 1, length.out = 101)) %>% ggplot(aes(x = poss, y = ways)) + geom_line() + theme_minimal() ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-23-1.png" width="504" /> --- L W W L W W W ``` r c("L", "W", "W", "L", "W", "W", "W") %>% compute_posterior(poss = seq(0, 1, length.out = 101)) %>% ggplot(aes(x = poss, y = ways)) + geom_line() + theme_minimal() ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-24-1.png" width="504" /> --- ## Incorporating priors What is the prior distribution we used in this example? A uniform distribution. Let's go back to our marbles example. <div class="tabwid"><style>.cl-3b4b097c{}.cl-3b48ddf0{font-family:'Helvetica';font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-3b49c1c0{margin:0;text-align:center;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-3b49cbb6{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 1.5pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b49cbb7{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 1.5pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b49cbca{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b49cbde{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b49cbdf{width:1in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b49cbe8{width:2in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}</style><table data-quarto-disable-processing='true' class='cl-3b4b097c'><thead><tr style="overflow-wrap:break-word;"><th class="cl-3b49cbb6"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">conjecture</span></p></th><th class="cl-3b49cbb6"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">p</span></p></th><th class="cl-3b49cbb7"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">Ways to produce [w b w]</span></p></th><th class="cl-3b49cbb6"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">plausibility</span></p></th></tr></thead><tbody><tr style="overflow-wrap:break-word;"><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">[w w w w]</span></p></td><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0.00</span></p></td><td class="cl-3b49cbde"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0 * 4 * 0 = 0</span></p></td><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0.00</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">[b w w w]</span></p></td><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0.25</span></p></td><td class="cl-3b49cbde"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">1 * 3 * 1 = 3</span></p></td><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0.15</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">[b b w w]</span></p></td><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0.50</span></p></td><td class="cl-3b49cbde"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">2 * 2 * 2 = 8</span></p></td><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0.40</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">[b b b w]</span></p></td><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0.75</span></p></td><td class="cl-3b49cbde"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">3 * 1 * 3 = 9</span></p></td><td class="cl-3b49cbca"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0.45</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b49cbdf"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">[b b b b]</span></p></td><td class="cl-3b49cbdf"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">1.00</span></p></td><td class="cl-3b49cbe8"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">4 * 0 * 4 = 0</span></p></td><td class="cl-3b49cbdf"><p class="cl-3b49c1c0"><span class="cl-3b48ddf0">0.00</span></p></td></tr></tbody></table></div> We draw one more marble (blue). We can start our garden of forking paths over with 4 levels, or we can use our old plausibilities as our priors. --- <div class="tabwid"><style>.cl-3b5179e2{}.cl-3b4f4154{font-family:'Helvetica';font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-3b5026be{margin:0;text-align:center;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-3b50310e{width:0.75in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 1.5pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b50310f{width:0.75in;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-3b503110{width:0.75in;background-color:transparent;vertical-align: middle;border-bottom: 1.5pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}</style><table data-quarto-disable-processing='true' class='cl-3b5179e2'><thead><tr style="overflow-wrap:break-word;"><th class="cl-3b50310e"><p class="cl-3b5026be"><span class="cl-3b4f4154">conjecture</span></p></th><th class="cl-3b50310e"><p class="cl-3b5026be"><span class="cl-3b4f4154">p</span></p></th><th class="cl-3b50310e"><p class="cl-3b5026be"><span class="cl-3b4f4154">prior</span></p></th><th class="cl-3b50310e"><p class="cl-3b5026be"><span class="cl-3b4f4154">Ways to produce [b]</span></p></th><th class="cl-3b50310e"><p class="cl-3b5026be"><span class="cl-3b4f4154">prior * count</span></p></th><th class="cl-3b50310e"><p class="cl-3b5026be"><span class="cl-3b4f4154">posterior</span></p></th></tr></thead><tbody><tr style="overflow-wrap:break-word;"><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">[w w w w]</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.00</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.00</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.00</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.000</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">[b w w w]</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.25</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.15</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">1</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.15</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.065</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">[b b w w]</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.50</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.40</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">2</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.80</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.348</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">[b b b w]</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.75</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.45</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">3</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">1.35</span></p></td><td class="cl-3b50310f"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.587</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-3b503110"><p class="cl-3b5026be"><span class="cl-3b4f4154">[b b b b]</span></p></td><td class="cl-3b503110"><p class="cl-3b5026be"><span class="cl-3b4f4154">1.00</span></p></td><td class="cl-3b503110"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.00</span></p></td><td class="cl-3b503110"><p class="cl-3b5026be"><span class="cl-3b4f4154">4</span></p></td><td class="cl-3b503110"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.00</span></p></td><td class="cl-3b503110"><p class="cl-3b5026be"><span class="cl-3b4f4154">0.000</span></p></td></tr></tbody></table></div> The posterior is the product of the prior and the count divided by the sum of the products. Or in other words: $$ P(H|D) = \frac{P(H)P(D|H)}{P(D)} $$ --- ## Exercise: Globe priors Return to the globe example. Assume a prior for `\(p\)` that is equal to zero when `\(p < 0.5\)` and is a positive constant when p `\(≥ 0.5\)`. Again compute and plot the grid approximate posterior distribution for each of the following sets of observations: - W W W - W W W L - L W W L W W W --- ``` r compute_posterior5 = function( the_sample, poss = c(0, .25, .50, .75, 1) ){ W = sum(the_sample == "W") # number of W observed L = sum(the_sample == "L") # number of L observed num_sides = length(poss) - 1 ways = sapply( poss, function(q) ifelse( q < .5, 0, (q*num_sides)^W * ((1-q)*num_sides)^L ) ) post = ways/sum(ways) bars = sapply(post, function(q) make_bar(q)) # this is from the rethinking package data.frame(poss, ways, post = round(post, 3), bars) } ``` --- W W W ``` r c("W", "W", "W") %>% compute_posterior5(poss = seq(0, 1, length.out = 101)) %>% ggplot(aes(x = poss, y = ways)) + geom_line() + theme_minimal() ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-28-1.png" width="504" /> --- W W W L ``` r c("W", "W", "W", "L") %>% compute_posterior5(poss = seq(0, 1, length.out = 101)) %>% ggplot(aes(x = poss, y = ways)) + geom_line() + theme_minimal() ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-29-1.png" width="504" /> --- L W W L W W W ``` r c("L", "W", "W", "L", "W", "W", "W") %>% compute_posterior5(poss = seq(0, 1, length.out = 101)) %>% ggplot(aes(x = poss, y = ways)) + geom_line() + theme_minimal() ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-30-1.png" width="504" /> --- # Computing the posterior Our compute posterior function works because it writes out in formulas we see with our eyes. We wrote a function to express the counting of ways we can see _W_ water and _L_ land given a proportion, _p_, of water on a 4-sided die: $$ \text{ways} = (p\times 4)^W( (1-p) \times4)^L $$ We even showed we can extend this for a die with more sides by changing 4 to `\(N\)` (number of sides). $$ \text{ways} = (p\times N)^W( (1-p) \times N)^L $$ Cool, but this is just a cute example. We're going to need more flexible ways of estimating posterior distributions for more complex examples, especially when we get into the multivariate parameter space. There are three main options for building the posterior: 1. Grid approximation (we'll use this today) 2. Quadratic approximation 3. MCMC --- ## Grid approximation ``` r # define grid p.grid <- seq(0, 1, length.out = 100) # define prior prior <- rep(1, 100) # compute likelihood at each value in grid likelihood <- dbinom(6, size = 9, prob = p.grid) # compute product of prior and likelihood unstd.posterior <- prior * likelihood # standardize posterior posterior <- unstd.posterior / sum(unstd.posterior) ``` --- ## Sampling from the posterior Think of the posterior as a bucket of values of possible parameter values. The posterior distribution represents the proportion of the bucket that each value takes up. We can sample from the posterior distribution: .pull-left[ ``` r sample_post = sample(x = p.grid, prob = posterior, size = 1e4, replace = T) plot(sample_post) ``` ] .pull-right[ <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-33-1.png" width="504" /> ] --- ``` r dens(sample_post) ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-34-1.png" width="504" /> --- ## Summarizing the posterior Now that we have a posterior, we can sample from it to ask questions. > What is the probability that the true proportion of water is less than .5? ``` r sum(sample_post < .5)/1e4 ``` ``` ## [1] 0.1669 ``` > What is the probability that the true proportion of water is between .5 and .75? ``` r sum(sample_post > .5 & sample_post < .75)/1e4 ``` ``` ## [1] 0.6161 ``` --- > What is 80 percentile interval? ``` r quantile( sample_post, c(.1, .9) ) ``` ``` ## 10% 90% ## 0.4545455 0.8080808 ``` ``` r rethinking::PI(sample_post, prob = .80) ``` ``` ## 10% 90% ## 0.4545455 0.8080808 ``` It's common in Bayesian statistics to report the **highest posterior density interval**, which is the narrowest interval containing the specified probability mass. ``` r rethinking::HPDI( sample_post, prob = .8) ``` ``` ## |0.8 0.8| ## 0.4646465 0.8181818 ``` --- These intervals are a convenient way to summarize the posterior distribution. RM suggests providing 67%, 89% and 97% intervals for all parameters of interest. Why those? - They are spaced enough to illustrate the shape of the distribution. - They are prime numbers. - Why not? ``` r rethinking::PI(sample_post, prob = c(.67)) ``` ``` ## 16% 84% ## 0.4949495 0.7777778 ``` ``` r rethinking::PI(sample_post, prob = c(.89)) ``` ``` ## 5% 94% ## 0.4040404 0.8484848 ``` ``` r rethinking::PI(sample_post, prob = c(.97)) ``` ``` ## 2% 98% ## 0.3232323 0.8989899 ``` --- ## Sampling for prediction Sampling implied observations is great for the following reasons: 1. Test model design. Specifically, testing priors. 2. Model checking. Verify that the model worked as intended. 3. Software validation. Can we verify that a known model is recovered? 4. Research design. Power analysis, but other things are possible. 5. Forecasting. Applied prediction but also model criticism are revision. --- ### How to simulate observations. Remember, `R` includes functions that can randomly simulate from known distributions. In the case of the globe tossing experiment, we have (1) two possible outcomes, (2) independent trials, (3) the same likelihood of Water on each trial, and (4) a known number of tosses. What kind of distribution describes all possible outcomes of our experiment? -- The binomial. ``` r dbinom(0:2, size=2, prob=.7) ``` ``` ## [1] 0.09 0.42 0.49 ``` --- We can random sample from a binomial distribution using `rbinom`: ``` r rbinom(n=1, size=2, prob=.7) ``` ``` ## [1] 2 ``` ``` r rbinom(n=10, size=2, prob=.7) ``` ``` ## [1] 2 2 1 1 0 2 1 1 2 1 ``` ``` r dummy <- rbinom(n=1e5, size=2, prob=.7) table(dummy)/1e5 ``` ``` ## dummy ## 0 1 2 ## 0.09017 0.41906 0.49077 ``` --- ``` r dummy_w <- dummy <- rbinom(n=1e5, size=9, prob=.7) simplehist( dummy_w, xlab="dummy water count" ) ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-42-1.png" width="504" /> These simulations help us represent uncertainty in the observation. When we know `\(p=.7\)`, we know what the next toss will _probably_ be, but we don't know what it will be with certainty. But there is also uncertainty in our parameter estimate. We don't know that `\(p=.7\)`. In fact, our posterior distribution tells us that many values for the parameter `\(p\)` are possible, albeit with varying degrees of likelihood. We want to carry forward this uncertainty. --- Remember, we've sampled possible values of `\(p\)` from our posterior distribution: ``` r sample_post = sample(x = p.grid, prob = posterior, size = 1e4, replace = T) ``` For each of these values of `\(p\)`, we can sample from the binomial distribution: ``` r w <- rbinom( 1e4, size=9, prob=sample_post) length(w) ``` ``` ## [1] 10000 ``` For each value of `\(p\)` we have sampled from the posterior, we have sampled one obsesrvation from the binomial distribution defined by this parameter. --- .pull-left[ Prediction distribution ``` r simplehist(w) ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-45-1.png" width="504" /> ] .pull-right[ Posterior distribution ``` r dens(sample_post) ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-46-1.png" width="504" /> ] --- What are some ways we can evaluate the model? Consider the original sample: W L W W W L W L W We might ask whether the observations are truly independent. That is, does landing on Water once make it more likely you land on Water the next time? For that question, we might consider calculating the longest run. In this case, our longest run is 3. Now we need only ask for the distribution of longest run in our sampled predictions. --- ``` r (x <- rbinom(9, 1, prob=.7)) ``` ``` ## [1] 1 0 1 0 1 1 1 1 1 ``` ``` r rle(x) ``` ``` ## Run Length Encoding ## lengths: int [1:5] 1 1 1 1 5 ## values : int [1:5] 1 0 1 0 1 ``` ``` r max(rle(x)$lengths) ``` ``` ## [1] 5 ``` --- ``` r longest_run = function(p=.7){ max(rle(rbinom(9, 1, p))$lengths) } sample_run = sapply(sample_post, longest_run) simplehist(sample_run) ``` <img src="data:image/png;base64,#lecture01-2_files/figure-html/unnamed-chunk-48-1.png" width="504" />